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Abstract 


In  this  paper,  we  develop,  analyze,  and  test  a  new  algorithm  for  nonlinear  least-squares 
problems.  The  algorithm  uses  a  BFGS  update  of  the  Gauss-Newton  Hessian  when  some  hueristics 
indicate  that  the  Gauss-Newton  method  may  not  make  a  good  step.  Some  important  elements 
are  that  the  secant  or  quasi-Newton  equations  considered  are  not  the  obvious  ones,  and  the 
method  does  not  build  up  a  Hessian  approximation  over  several  steps.  The  algorithm  can  be 
implemented  easily  as  a  modification  of  any  Gauss-Newton  code,  and  it  seems  to  be  useful  for 
large  residual  problems. 
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Nonlinear  least  squares,  quasi-Newton  methods,  least-change  secant  update  methods,  variable 
metric  methods. 


1.  Introduction 


Nonlinear  least-squares  problems  are  frequently  encountered  in  practical  optimization,  and 
they  are  also  of  interest  to  the  algorist  because  of  their  highly  structured  nature.  In  this  paper, 
we  suggest  another  way  to  use  this  structure  in  an  attempt  to  increase  the  efficiency  of  the  trusts 
region-Gauss-Newton  or  Levenberg-Marquardt  algorithm,  More'  (1977),  Dennis-Schnabel  (1983). 

The  algorithm  presented  here  is  inspired  by  NL2SOL,  Dennis,  Gay,  Welsch  (1981a, b),  in 
that  it  chooses  at  each  iteration  whether  to  use  a  Gauss-Newton  quadratic  model  or  a  variable 
metric  augmentation  of  the  Gauss-Newton  model  to  define  the  next  iterate.  The  difference  is  that 
the  variable  metric  augmentation  used  here  requires  less  storage,  less  algebra,  and  less  code  than 
NL2SOL.  However,  it  seems  to  have  no  better  theoretical  justification  than  the  Gauss-Newton 
method.  Still,  it  seems  to  use  fewer  residual  and  Jacobian  computations  than  the  Gauss-Newton 
for  some  large  residual  problems  and  to  require  little  additional  arithmetic  at  each  iteration. 
Conversation  with  NL2SOL  users  encouraged  us  to  undertake  this  research,  and  we  publish  it  now 
in  hopes  that  they  will  find  it  helpful  and  that  our  colleagues  will  find  it  an  interesting  use  of 
secant  updating  ideas. 

Section  2  explains  the  augmented  local  model  in  its  various  forms,  and  points  out  some 
overlap  between  our  ideas  and  those  of  Al-Baali  and  Fletcher  (1983).  Section  3  contains  a  unified 
local  convergence  proof  under  standard  Gauss-Newton-type  assumptions  for  all  combinations  of 
the  methods  presented  here.  Section  4  describes  a  model-switching  strategy  and  the  resulting 
hybrid  algorithm  that  adaptively  decides  whether  to  use  the  Gauss-Newton  model  or  an  augmen¬ 
tation  at  each  iteration.  Section  5  compares  an  experimental  implementation  of  the  algorithms 
suggested  here  to  the  LMDER  implementation  of  the  Gauss-Newton  method  and  the  NL2S1  rou¬ 


tine  from  NL2SOL. 
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2.  The  Augmented  Model. 

Let  F: ft  C  R  P-+R  n  be  continuously  differentiable,  and  consider  the  nonlinear  least-squares 
problem  of  finding  a  local  minimizer  xt  for 


tt*)  =  izf(x)Tf(x)  =  tIKM1))2  ■ 

z  z  *  =  l 


(2.1) 


The  classical  algorithm  for  this  problem  is  the  Gauss-Newton  method  which  can  be  thought  of  in 
two  ways: 

First,  we  can  linearize  F(x)  —  F(xe)  about  the  current  parameter  vector  xc  to  obtain  the 
local  affine  model  for  F(x), 

F{x)  ~  F{xc)  +  Jc{x-xc), 

dfi 

where  Jc  =  J{xc)  =  F'(xc)  =  ( - (*,;))■  Then  we  can  seek  to  improve  xc  by  taking  the  next 

OXj 

estimate  x_  to  be  the  value  of  the  parameter  vector  that  solves  the  linear  least-square  problem 
defined  bv  the  local  affine  model. 


The  sum-of-squares-of-residuals  of  this  model  is 


^\F(xc)  +  Jc{x -*e)]  T\f{xc)+JAx~xc)\  . 


and  it  can  be  viewed  as  a  local  quadratic  model  of  <j>{x )  of  the  form 


(2.2) 


4>{x)  ~  mfN(a:)  =  <t>{xe)  +  v<t>{xe)T(x-xc)+^(x-xe)TJeTJc{x-xe)  .  (2.3) 


A  second  way  to  view  this  local  quadratic  model  is  as  an  approximation  to  the  Newton 


model 


W)  =  ^(*e)  +  V^(*c)7’(*-*e)+y(*-*e)TV20(*c)(*-*c) 


(2.4) 


where 
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vV(ze  )-JcTJc  =  £/.(*c)v2/,(*<)  =  S{xe)  (2.5) 

i=l 

is  approximated  by  the  zero  matrix.  It  is  easy  to  reason  from  either  derivation  that  the  difference 
between  the  two  models  depends  on  the  size  of  the  residuals  F(xe)  and  on  how  nearly  affine  F  is 
in  a  neighborhood  of  xc . 

Aside  from  the  obvious  advantage  that  the  Gauss-Newton  method  has  of  not  having  to  com¬ 
pute  or  make  assumptions  about  the  n  pXp  Hessians  V2/«(*«).  *  =  1,  •••,  n,  it  also  is  guaranteed 
to  generate  descent  directions  as  long  as  Jc  is  of  full  rank.  This  happens  because 
Sj'lmGN{xc )  =  JCTJC  is  positive  definite,  and  it  means  that  the  next  iterate  xGN  can  be  calculated 
by  solving  the  linear  least-squares  problem  associated  with  (2.2)  for  sGN  =x+N-xe.  This  leads 
to  a  useful  simplification  over  Newton’s  method  of  the  problem  of  proceeding  from  a  poor  initial 
guess  when  ^/2<p(xc)  may  not  be  positive  definite  even  though  vV(z.)  is. 

The  major  disadvantage  of  (2.3)  with  respect  to  (2.4)  is  that  neglecting  S(xc)  often  leads  to 
a  significantly  less  accurate  local  model.  It  is  not  surprising  that  this  would  cause  slower  conver¬ 
gence  near  a  local  minimizer  of  a  large  residual  nonlinear  problem;  however  far  from  x„  it  results 
often  either  in  xGN  not  being  an  acceptable  next  iterate  when  is,  or  in  a  smaller  residual 
reduction  than  Newton’s  method  provides. 

NL2SOL  essentially  retained  the  Newton  advantages  without  having  to  compute  S(xc).  It 
did  this  by  using  a  variable  metric  update  method  and  an  adaptive  modeling  technique  to  switch 
between  the  Gauss-Newton  model  and  an  augmented  model  of  the  form 

mf(*)  =  <t>{xc)  +  ’Kj<t>{xc)T{x-xc)+^{x-xc)T[JcTJc  +  Sc]{x-xe)  .  (2.6) 

where  Sc  is  a  variable  metric  approximation  to  S(zc)  and  Jc=J(xe).  Here  as  in  NL2SOL,  the 
decision  is  made  at  the  end  of  the  current  iteration  whether  to  use  the  Gauss-Newton  or  its  vari¬ 
able  metric  augmentation  to  make  the  next  step.  This  decision  is  based  on  a  simple  comparison  of 
the  actual  residual  reduction  4>(xc)-<j>(x+)  to  the  predictions  4>{xc)~  mGN(x+)  and  <f>(xc)-m*( x+). 
The  algorithm  suggested  here  does  not  attempt  to  build  up  a  good  approximation  to  5(tc)  over 
several  iterations;  it  temporizes  a  cheap  rank-2  approximation  if  a  given  iteration  seems  to  call  for 
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it. 

We  complete  this  section  with  a  description  of  the  way  we  suggest  defining  Se  in  (2.6)  and 
with  some  basic  facts  about  this  definition.  We  postpone  a  discussion  of  implementational  details 
until  Section  4. 


Given  xc  ,  x_,  Jc  ,  J_,  Fc  ,  F_,  information  at  the  current  and  previous  iterate, 
Set  s_  =  xe  -  x_  and  0  <  ae  <min{  1 ,  ||  JjFc  ||  }  ; 


Set  either 

*-  =  Jj\  Jcs_  +  ae(Fe  -  F_)\ 
or 

y_  =  Jjjcs.  +  ae{jjFe  -  JlF_), 
or 

y.  =  Jjjcs„+  ac{jJ  -  JT)Fct 
and  either 


qBFGS  _ 


Jj  Jc  *S-  JUc 
s  J  Jj  Jc  «_ 


or 


S°FP  = 


{y_-jjjcsj)yl  +  y-(y--J?JeS-)T  sT{y_~Jjjcs_)y_yI 


T 


(*V)2 


Not«  that  with  any  pair  (2.8),  (2.9),  we  have 


(2.7) 

(2.8a) 

(2.8b) 

(2.8c) 


(2.9a) 


(2.9b) 


{JlJc  +  Sc)s_  =  y_  ,  (2.10) 

and  that  ac  —  0  recovers  the  Gauss-Newton  method.  If  we  take  ac  =  1,  then  we  obtain  the  two 

methods  that  Al-Baali  and  Fletcher  (1983)  call  the  GN-BFGS  and  GN-DFP  methods.  Our  proof 

techniques  can  not  support  always  taking  ac  =  1,  and  so  our  analysis  does  not  apply  to  their 

methods.  Our  numerical  results  suggest  that  ac  —  1  is  not  always  the  best  choice. 


It  is  tedious,  but  not  difficult,  to  show  that  if  Jc  has  lull  rank  and  s  Jj/_  >  0,  then  Hc  =  Jj  Jc  +  Sc 
is  positive  definite  and 


5 


(h?fgY  =  {JjJc  +  sfFGY  =  (jJjc  r1  + 

«-  V- 

y-[s--(J^JeY1y-]ssI 


(sly-)2 

In  fact,  if  any  symmetric  nonsingular  matrix  A  replaces  JjJc  in  (2.9),  then 


(2.11) 


T 


(A  +  srav  =  [  /  -  -^1  aAi  -  I±- 
sty,  sty.) 


T  T 

ysi  \  s.st 


+ 


r  t 

sty,)  sty _ 


and 


(2.12a) 


rp  fp  X  T 

a+s™=\i-*£-)a\i-Z£-\ 

sty.)  I  sty. )  sty _ 


(2.12b) 


3.  Local  Convergence. 

In  this  section,  we  will  present  a  local  convergence  analysis  for  the  quasi-Newton  or 
Newton-like  method  based  on  the  augmented  local  model  discussed  in  the  previous  section.  Our 
major  result,  Theorem  3.4,  will  be  the  same  as  the  standard  local  result  for  the  Gauss-Newton 
method  (see  Dennis  (1977);  indeed,  if  we  always  choose  0  =  0,  then  the  new  method  is  easily  seen 
to  be  the  Gauss-Newton  method.  It  will  be  convenient  to  collect  some  useful  bounds  before  we 
start  the  main  proof.  We  will  always  use  the  /2  norm  for  both  vectors  and  matrices,  and  for 
xGR“,  and  e  >  0,  iV(x;c)  =  (x  gR"  :  ||  x  -x  ||  <  e  }.  For  any  set  D  CR",  D  will  denote  the  norm 
closure  of  D. 


Lemma  8.1.  If  x,y  are  two  vectors  of  order  n  such  that  xry  =  l,  I  is  unit  matrix  of  order  n 
then 

||/-xyr||  =  ||  x  ||  •  ||  y  ||  . 

Proof.  Straightforward. 

Lemma  8.2.  Let  F:Rn  — >Rm  be  continuously  differentiable  in  an  open  convex  set  D  CR”,  and 
let  J  be  Lipschitz  continuous  in  D: 


for  any  x  ,  xgD.  Then. 


||J(x)-/(x)||  <7||*-*|| 


(3.1a) 


0 


II  J{x)T  -  J{*)T\\  <  'll!  *  ~  *11  ,  (3.1b) 

\\F(x)-F(I)-J(I){x-I)\\  <  j||*-*||2,  (3.2) 

and  if  D  is  compact,  then  for  erM  >  max  ||  J(x  )  || , 

i 

\\J(x)TJ{x)-J(x)TJ(x)\\  <  7111*- *||,  (3.3) 

||J(*)rF(:r)-J(z)rF(z)||  <  Ti||*- *||  ,  (3.4) 

for  Ti  =  2-)  <r m  and  t2  =  <*M  +  T'max  ||  F(x  )||. 

i  €D 


Proof.  See  Dennis  and  Schnabel  (1983)  pg.75  for  (3.2).  The  Lipschitz  condition  (3.1b)  follows 
directly  from  the  fact  that  the  l2  norm  of  a  matrix  and  its  transpose  are  the  same.  The  Lipschitz 
condition  (3.3)  follows  because 

\\J(x)TJ{x)-J(x)TJ{x)\\  <  ||y(*)r[/(*)- •/(*)]  II  +  ||[y(F)-J(x)]rJ(i)||  <  2~i<tm\\x-x  ||  , 
where  ay  exists  because  D  is  compact.  Finally,  to  get  (3.4), 

||/(a-)rF(x)-J(J)rF(?)||  <  \\J(x)T[F(x)-F(x)}\\  +  \\[J(x)-J(x)}tF(x)\\ 

i 

<  ffA//ll4*+e(*-*))ll  li*-*llrf©  +  'ill*  -  *INI^(*)II  <  '72II*  -  *11  ■ 
0 

Lemma  S.S.  Let  the  hypothesis  of  Lemma  3.2  hold  and  let  x,ED.  If  J{x,)T J{x,)  is  positive 
definite  with  smallest  eigenvalue  X,,  then  for  any  pE(0,  l),  there  exists  e>0  such  that  for 
*, * E N{x/.()  and  s=x-x, 

sT J(x)t{F{x)-F(x)}  >  ■  (3-5) 

Also,  J(x)TJ(x)  is  positive  definite  with  smallest  eigenvalue  greater  than  pX,  and  satisfies  the 
Lipschitz  condition: 

||  [/(*)M*r-[*'(*)rW  II  <  -is' II*-*  II  .  (3-6) 

for  some  t3  <  - — —5-  . 

(p-K) 

Proof.  Because  J(x)TJ(x)  is  continuous  at  xt,  there  exists  6  >  0  such  that  the  smallest  eigenvalue 
of  J(x)T  J(x)  is  larger  than  p  X,  for  all  ||  x-xt  ||  <  6.  Thus,  J(x)TJ(x)  is  invertible  and 
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|||/(.)r/(«)l-1||  <-±-. 

PA. 

For  any  x, 


(3.7a) 


II  [/(FfJlx)]-1-  \J{x)T  J{x)]-'  ||  <  ||[J(S)M5)]-MlMW5)Mi)-A*)M*)IIMI  WOM*)]-1!! 

<  (~t_)2't,i ll*~z  II 

p  A» 

by  (3.7)  and  (3.3). 


Now  set  e  <  min{6, 


P 

— - }.  So  for  any  p  £R",  x  6  Mx.je),  the  inequality 

I'TVm 


P^.\\p\\2<PTJ{x)TJ{x)p  (3.7b) 

holds.  Now  suppose  x, i£JV(i,;e),  then  from  the  previous  lemmas  and  the  Cauchy-Schwartz- 

Bunyakovskii  inequality,  we  have 


|[J(x)«]r[F(x)-F(x)]-[/(xHr[J(x)<]|  <<rw|MI  \r\\°\\* 

<  y^j^f[||*-*.||  +  ll*-*,||]  II*  II2 
<  7^A/t||«||2  <  ~Y~  '  II*  II2  • 

Therefore, 


[•J(*)«]3V(*)«]-^1IMI2  <  [/(?)«]r[F(x)-F’(x)]  <  [J(x)«]r[y(x)«]  +  -^-||fi||2 


Hence 


Theorem  S.4.  Let  the  hypothesis  of  Lemma  3.3  hold  and  assume  that  x,  is  the  only  local  minim- 
izer  of  4>  which  is  convex  in  D .  Assume  that  for  some  7,  <  X,  and  every  x  (ED  , 

||[/(x)-/(x,)]7'F(x,)||  <7.  II*  — a:,  ||  •  (3.8) 

<7 

Let  r£(- — ,  1).  Then  there  exists  e  >0,  such  that  for  all  J0£iV(i,;(),  the  following  sequence  gen- 
X, 

erated  by  any  combination  of  the  Gauss-Newton  method  with  its  BFGS  and  DFP  augmentations 
is  well  defined  and  converges  to  x,  with  ||x,*+1-x,  ||  <  r||x*-x,  ||: 
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*i  =  *o-{Joh)  lJoF0 
For  k  =  1,2,3, DO 

«*-i  =  *k~*k-i  ,  choose  0  <  ak  <  min{l,  ||/*rF*||}  ; 


Vk- 1 


Jk{Jk8k~]  +  ak {Fk  —  Fk-l )]  , 

or 

JkJksk- 1  +  Olk(JkFk  -  Jk-jFk_})  , 

or 

•4r  +  <**(  Jf  -  Jk-i  )Fk ; 


IgBFGS  _ 


y*-i  yl- 1 

«*ri  y*-i 


Jk  Jk  sk-l  *k- 1  JjtT-4 
®£l  Jk  Jk  sk-l 


or 


(J  _  qDFP _  (Vk-l  Jk  Jksk-\)yk-i  +  y*-l(y*-l  JkJksk-\)T  sk-\  (jhr-1  ~  Jk  Jk  sk-l)yk-iyk-l 

(sk-iyk-if  1 


sYiVk-1 


Hk  —  Jk  Jk  +  Fk  ; 


xk+1  =  xk-HklJkTFk  ; 

END. 

Furthermore,  if  7,  =0  or  |]F(a:,)||  =  0,  then  the  convergence  is  at  least  q-quadratic. 

Proof.  First  we  establish  some  more  useful  inequalities.  Choose 

1), 

and  take  €[>0  to  be  as  chosen  by  Lemma  3.3.  From  (3.4)  and  the  fact  that 
J{x.)TF{x,)  =  JjF ,  =  0,  we  see  that  if  xEN(x,;e1),  then 

|]/(a:):rF(z)||  <  q2  ||z-*J|  .  (3.9) 


Furthermore,  by  (3.2),  (3.3),  (3.8), 


0 


\\J(*)rF(x)-jJj,{x-x,)\\  <  ||/(*)||  ||F.-F(x)-J(xKx.-x)|| 

+  \\J{x)TJ{x)-Jjj.\\  •||x-x,||+||(/(x)-J.)rF.|| 

<  i|x-x,  ||2  +  7j  ||x-x,  [|2  +  %  ||x-x,  ||  (3.10) 

=  ('r4ll*-x.||+7,H|x-x,||  , 

where  -7*  =  a M  y  +  TTi . 

Similarly,  for  x_,  x.eJVfi.jf,)  and  ore,  defined  as  in  (2.7),  (2.8a),  we  get  from  (3.3), 
(3.9),  and  the  choice  of  ac  <  ||  jjFc  || ,  that  for  ^  =  7l  +  l2  max{4,  12}, 

1 1  y-~  JjJ.*- 1 1  =  \\JlJc*-  +  <*cJl{Fc-F_yjJj.S_\\ 

<  ||«-||+ae  ||7ir(Fe-F_)|| 

<  ||  JJJC  -  JjJ,  ||  •  ||  ._||  +  ||  JJFC  ||  •  ||  Jj{Fc-F_)  ||  (3  Ua) 

—  7l  II5-  II  '  WXc  ~X>  II  +72^Af  II5- II  '  Ike  -X,  II  <  75  II  5-ll  •  ||xc  -x,  ||  , 
and  for  jr_  defined  by  (2.8b), 

l|j/--^.5-ll  =  ||  JlJcs.  +  ac{JjFc-JlF_)-jJj,s_\\ 

<  ||  JjJc  -  JjJ.  INI ||  +  HJ^P.  ||  •  ||  JjFc  -JT_F_  II  (3.11b) 

ll5-ll  -11^-^11+72  H5-||-|ke-X.||  <75||5-||  -||XC-X,||  , 
and  for  y_  defined  by  (2.8c), 

1 1 y.-  JUs- 1 1  =  1 1  J?Jc 5-  +  JT.)Fc  -  JjJ.s.  1 1 

<  \\J?Jc-J?J,\\-\\S-\\+\\J?Fe\\-\\(J?-J?)F'\\  (3.11c) 

<7i  ll5-ll  -||^-xj|  +-7  ||«_||  -||FC  ||  -72-||xc-x,  || 

—  7i  II  5-||  ‘  II  xe  -x,  ||  +72  II 5- 1|  ’  II  xc  -  x,  ||  <'fc||a_||-||*t-*.||  . 

Also,  ac  <  1 ,  so  for  the  three  different  y_  definitions  and  ^  +  max{a^,  72}> 

IIJ'-H  <  ll^ll  •H5-||+ac||Jcr[Fc-F_]||  <^|h.||+^||«_||  <  76  ||«_||  ,  (3.12a) 

IU-II  —  II  II  ■  ll5-ll  +«c  WJjFe-JTF^  ||  <  erf,  1 1 «_  II  +72  1 1  s_  II  <  Te  1 1  1 1  ,  (3.12b) 

and 

ll»-ll  -  II  ^  II  ll5-ll  +a'  ll(^r-^-)^]||  <  oli  ||  *_  ||  +72  ||«-||  <  7e  1 1  1 1  •  (3.12c) 

Finally,  for  the  three  y_  definitions,  we  have 

s-y-  =  S-JjJcS-+  ae*TjJ[Fe-F_)  >  sTjJj'S .  >  p\t  ||  «_  ||2  (3.13a) 

from  (3.5)  and  (3.7b),  and 
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•-V-  =  *- + a C»T [jjFc -  JTF_\  >  sljjjcs_  >  pX.||s_||2  , 
since  <j>  is  convex  on  D.  See  Ortega-Rheinboldt  (1970),  pg  86.  In  the  same  manner, 


(3.13b) 


t-V-  =  *>TJ[JC  e_+acsT[J^-  Jt}Fc  ±  ac  sTJtF_ 

=  8TJ?Jcb_+  <*cst\JJFc  -  JtF_]  +  ac sTJT (Fe  - F.)  (3.13c) 

>  slJjjcs_  >  pX,||s_||2  . 

These,  (3.12),  and  Lemma  3.1  allow  us  to  say  in  either  case  that 


lU- 


ll  =  II 


It  will  also  be  useful  to  have  +  ^77£(l  +  ^7) . 


Ily-INM 

sly. 


(3.14) 


Now  we  prove  the  theorem  by  induction. 


Let  r  € 


l)  and  take 


e  <  min 


7, 

X. 


7, 

X. 


■*r 


^  2  .  +2  ,,  ,  ,  278  74+7g 

T~  +  l3l2  T- +  /77l'372+ 75— Y(l  + 7?)  2' - 

A*  A,  pX,  X, 


Given  any  ar0 £%,;{),  is  nonsingular  by  the  choice  of  tj,  so  ar j  is  well  defined  and  from 

(3.10),  (3.6)  and  (3.9)  we  have 


ll*i-*.ll  —  llzo-ar»-(^oUo)  UjF0|| 

<  WfJ.Y1  II  •  II  J$FQ-jJj0(xo-x,)  II  +  ll(^o)-,-(Wl|  -H^oll 

<  '  [(74 1 1  zo~  z»  1 1 )  +  7»]  Hzo-zJ|  +73llzo-z.||72|lzo-z.ll 

-  ^y’+7s72)€+  Y"]  lU0-**!!  <  r||,0-«.||  <  e  . 

Suppose  IUj-x.ll  <  r  ||  ary_i-a(  ||  for  j  —  1,2  Then  by  Lemma  3.3,  JkJk  is  positive 

definite  and  this  together  with  (3.13)  ensures  that  H k  is  positive  definite  in  either  case;  so  x*+1 
exists.  Since  either  augmentation  gives  the  Gauss-Newton  step  when  ak  =  0,  we  can  concentrate 
on  the  augmented  steps.  First,  we  will  carry  out  the  induction  step  for  the  BFGS  augmentation. 
By  (2.10),  (2.11),  (2.12),  (3.14),  (3.10),  (3.9),  and  (3.11), 
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*k-\Vk-i 


•k-\Vk-i 


1  tk-iVk-i  Sk-iVk-i  ‘k-\Vk-i> 

) 


=  y*-ia*T-1 

»k-\Vk-i 


«k-iVk-i 


+  *^  +  {I-Mk.)TijTjtYl{I_9!Z lfl±h{jTjr}jTFkl 


Sk-iyk-i 


•k-iVk- 


*L\  Vk-i 


*k-iVk-i 


‘k-iVk-i 


,  „  [**-i  (^»r,0  *y*-i]a*-i  ,  a*-i[**-i_(/.T^.r1y*-i]T  ,T  yk- i**Tixll  ,,  rrr,  ,, 

+  m  5=  •  ?= - (* — f - ;l  I ' II II 


Sk-iVk-i 


*  k-iVk-i 


* k-iVk-i 


^  7«l|**-*.||  +  7.  „  ||  2  ,,  ll9 

~  - X -  llX*-*.ll  +^3^2  1 1  *t-*»|  | 


+ 


1  ,  Ttell**-!  II  *  II **-«* ||  *  II **-i  II  ,  ||«*-i  IK II «*-i II  '  II  ** I 


pk\mv 


-  ^T" +  ^T" +  ^372  +  +  77)W  II  || 

A*  *.P 


-T7h2lk*-!T,|| 

(3.15a) 


5:  r  1 1  **  "  z. I 


To  complete  the  induction,  let  us  consider  the  case  when  Hk  is  the  DFP  augmentation  of 
JkJk-  From  (2.12b),  (3.3),  (3.12),  (3.13),  and  (3.14), 


ll^-^.ll  =  \\(I-^^)J^Jk{I-^±)T  +  ^^± 

sk-\Vk-i  Sk-iyk-i 


*  k-iVk-i 


*  k-iVk-1 


=  ||(7-  ^  8±L){jTjk_jTjt){I_y>^I±)T 

sk-\Vk-\ 

{yk~\- Jj J,sk-\)yI-\  ,  yk-i{yk-i-JjJ.Bk-\)T  , ,  yk-isk-i  ,, 

'  T  '  T  V**  T  /  II 

**-i  y*-i 

2  II  tT  T  tT  T  II  i  1 1  ^*-1  1 1  1 1  Vk-l  ~  J,  J,sk-\  1 1  ,  lls*-l||  ||j/fr-l|i  , 

<  7? !|| J*  Jk-J.  J.W  + - r - (!  + - f - ) 


s  k-iVk-i 


Sk-iVk-i 


<  ilhi  +7775(1  +  77))  II  xk-x,  ||  =  7811**-*.  II  . 


Thus 
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and 


\\Hk-JJj.\\  <'T8€<  -f 


Hence  by  the  Banach  Perturbation  Lemma  (Dennis  and  Schnabel  (1983),  pg.45), 

IKW1!! 


I Hk~'\\  < 


i-i 


<  1  1  =  _2 
X,  | 

2 


Finally, 


I **+i-*.  ||  =  \\xk-x,-Hklj[Fk\\ 

<  Y-[(74||**-*.||  +  7.)|  \xk-x,\\  +  78|l**-*.l|2] 

<  T~  [7,  +  (74  +  7s)e]  •  1 1  **  -  *,  II  <  r  ||ar*-ar,||  . 


(3.15b) 


This  completes  the  proof  of  q-linear  convergence.  If  7,  —  0,  then  q-quadratic  convergence  follows 
immediately  from  (3.15).  Of  course,  if  ||F(x,)||  =  0,  then  7,  =  0. 


4.  The  Hybrid  Algorithm. 

In  this  section,  we  will  describe  a  model  switching  strategy  and  give  details  of  a  hybrid  algo¬ 
rithm  that  adaptively  decides  at  each  iteration  whether  to  use  the  Gauss-Newton  model  (2.3)  or  a 
BFGS  augmented  model  (2.6),  (2.9a).  Much  of  the  implementation  follows  MINPACK  and 
NL2SOL.  Except  for  model  switching,  the  basic  algorithm  is  essentially  the  same  as  in  the  subrou¬ 
tine  LMDER  of  MINPACK.  The  model  switching  decision  is  borrowed  from  the  ideas  in  the  sup- 
routine  NL2S1  of  NL2SOL.  Among  the  major  differences  between  our  hybrid  algorithm  and  the 
one  used  in  NL2SOL  are: 

(1)  No  doubling  of  the  trust  radius  is  tried  internal  to  an  iteration. 

(2)  If  five  unsuccessful  steps  are  attempted  in  an  iteration  with  the  currently  preferred  model, 
then  the  algorithm  will  start  the  next  iteration  with  the  other  model. 
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Before  presenting  the  algorithm,  we  make  some  preliminary  remarks  that  will  also  serve  to 
define  the  variables  in  the  algorithm.  The  actual  reduction  and,  the  predicted  reduction  pnd, 
and  the  directional  derivative  dirder  are  computed  as  in  the  subroutine  LMDER.  Moreover,  the 
updating  strategies  for  the  step  bound  A*  and  the  Levenberg-Marquardt  parameter  X  are  also 
taken  from  that  routine. 

The  extra  work  involved  in  trying  the  alternate  model  in  any  iteration  after  the  currently 
preferred  model  step  has  been  computed  is  not  significant.  To  see  this,  suppose  that  the  alterna¬ 
tive  is  the  BFGS  augmented  model.  We  would  already  have  the  factorization 


Jt  =  QR  , 

and  if  we  follow  Goldfarb  (1976),  Dennis-Schnabel  (1981)  or  (1983,  pp. 200-201),  and  define 


Jbfgs  —  Jk  +  uvT  , 
Jks_ 


where  «  = 


,  V yTs~ 

v  — - - ,  and  t  =  ■ 


(4.1) 


ii  wr  ’  y/znnj.' 

then  Hkras  —  Jbfgs  Jbfgs*  and  it  is  cheap  to  obtain  the  QR  decomposition  of  Jbfgs  by  the  fol¬ 
lowing  well  known  means: 


Jbfgs  —  Q  R  +  uvT  ~  Q[R  4-  wvT)  —  Q  Q  R  —  Q'R  . 

Similarly,  in  the  case  that  the  iteration  starts  with  the  BFGS  augmented  model  and  the  alternate 

is  the  Gauss-Newton  model,  the  QR  factorization  of  Jk  can  be  obtained  easily  from  the  QR  fac¬ 
torization  of  Jbfgs ■  F°r  complete  details,  see  Dennis-Schnabel  (1983,  pg.57).  In  the  algorithm 
below,  we  will  use  the  notation  Jt  and  Jc  to  denote  the  Jacobian  or  Jbfgs  35  is  appropriate 
depending  on  which  is  the  alternate  and  which  is  the  currently  preferred  model.  We  will  use  ma 
and  mc  for  the  quadratic  models  themselves. 


The  Hybrid  Algorithm 

(l)  Initialize  Jt  =  l,  Xt=0,  So  =  0=yo>  A*- 

Set  Current_model  =  Gauss-Newton;  Alternate_model  =  BFGS. 
Compute  Fk  =F(xk),  Fnorm  =  ||  Fk  || ,  and  phik  —  ||  Fk  || 2. 
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(2)  Set  first  =  .TRUE. ,  ifail  =  0  and  compute  Jk  —  J[xk). 

Rescale  Dk  if  neccessary. 

If  Current_model  =  Gauss-Newton  Then  Jc  =  Jk 
Else  Jc  =  Jbfgs 

End  If 

(3)  Solve  for  s  to  minimize  ||  Fk  -t-  Jks  || ,  subject  to  ||At«||<A*. 

(4)  Set  x£+i  =xk  +  8  and  pnorm  —  \  \  Dk  s  1 1 . 

Compute  F(x£+i)  and  Fnorml=  ||F(x£+1)||,  phikp=  ||F(x1f+1)||2. 

(5)  Compute:  ared  —  l-(Fnorml/Fnorm)2\ 

pred  =(  ||  Jks  \\  /  Fnorm)2  +  2-{\k^-pnorm  /  Fnorm)2-, 
dtrder  =  -[( 1 1  Jk a  1 1  /Fnorm )2  +  (\k^2pnorm /Fnorm  J2] ; 
rho  = ared /pred. 
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(6)  If  rho  >  .25  Then  (*  Probably  a  good  step.) 

If  rho  >  .75  or  X*  =0  Then  (*  It  was  a  good  step.)  A*+1  =  2- A*  ,  X*  =  X*/2 
Else  Ai+i  =  At  ,  Xi+1  =  X* 

End  If 

Else  (*  Probably  a  bad  step.) 

If  first  =  .  TRUE.  Then 
first  =  .FALSE. , 

Compute  Jt  (*  See  the  discussion  before  the  algorithm.) 

If  \me{xl+1)-phikp\  >  1.5-|m.(xf+1)-p«*p|  Then 
Solve  for  s  to  minimize  ||F^  +  Jas  || 

subject  to  1 1 ||  <  A*. 

Set  xtfh  —xk  +  s  and  pnorm 2=  1 1 Z>* «  ||. 

Compute  F(xfti),  Fnorm2  =  ||F(x^1)||, 
and  phikpp  =  ||F(x^])j|2. 

If  phikpp  <  phikp  Then 

4+i  =4+i, 

F{*t+1)  =  F{xP+l),Jk  =  J„ 

pnorm  —  pnorm2,  Fnorml  —  Fnorm2,  phikp  =  phikpp. 
Go  to  (5)  (*  Check  fit  of  alternate  model.) 

End  If. 

End  If. 

End  If. 

If  ared  >  0  Then  temp  —  .5 

Else  temp  —  .5- dirder /(dirder  +  .5- ared) 

End  If. 

If  (Fnorml  >  10  Fnorm  or  temp  <  .1)  Then  temp  =  .l 
End  If. 

A*+1  =  temp -minlA*  ,  10-pnorm) 

^*+i  =  X*/temp 

End  If. 


(7)  Check  for  convergence. 


16 


(8)  If  rho  <  KT4  Then  (*  Bad  step;  compute  another.) 
ifail  =  ifail  +1 
If  i fail  =  5  Then 

Exchange  model  preferences  for  next  iteration 
Go  to  (2) 

End  If. 

Go  to  (3). 

End  If. 


(9)  Set  zk+l  =  xf+i ,  Fk+1=F(x£+1),  Fnorm  =  Fnorml,  phik  =  phikp. 

If  \m((xk+l)-phikp  |  >  1.5- 1 mt(xk+1)-phikp  |  Then 

Exchange  model  preferences  for  next  iteration 

End  If 

(10)  Set  k  —  k  +  1,  Go  to  (2). 

5.  Numerical  Results. 

In  this  section,  we  will  try  to  compare  a  preliminary  implementation  of  our  hybrid  algo¬ 
rithm  NOXLSQ  with  the  MINPACK  subroutine  LMDER  and  the  NL2SOL  subroutine  NL2S1. 
Our  implementation  is  as  close  as  possible  to  LMDER  since  we  felt  that  the  hybrid  algorithm 
should  be  -viewed  as  a  way  to  modify  the  Gauss-Newton  method  on  large  residual  problems.  In 
fact,  we  didn’t  even  tune  any  of  the  constants  involved  in  the  LMDER  code  to  find  values  that 
gave  better  performance  with  our  hybrid  algorithm.  All  the  runs  were  done  in  double  precision  on 
the  Data  General  MV-1000  at  IMSL  using  stopping  criteria  of  1.0D-8  on  the  relative  stepsize  or 
the  relative  change  in  the  sum  of  squares.  The  26  problems  used  in  our  test  were: 
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Problems  1-18  were  the  MINPACK-1  nonlinear  least-squares  test  problems. 

Problems  19-26  were  the  NL2SOL  test  problems  that  were  not  included  in  the  MINPACK 

test  set.  They  were:  19.)  Woods’  function,  20.)  Zangwill’s  function,  21.)  Engvall’s  function, 

22.)  Branin’s  function,  23.)  Beale’s  function,  24.)  Cragg’s  and  Levy’s  function, 

25.)  the  Davidon  1  function,  26.)  Madsen’s  function. 

The  results  for  LMDER  and  NONLSQ  were  indentical  for  most  of  the  problems  and  so  we 
only  present  the  results  for  Problems  7,  8,  9,  14,  15,  18,  26.  These  results  are  summarized  in 
Tables  1-8  where  we  follow  the  practice  of  using  the  ‘standard’  starting  point  for  the  problem  the 
first  time  the  problem  is  listed,  ten  times  that  standard  point  if  the  problem  is  listed  again,  one 
hundred  times  the  standard  starting  point  if  there  is  a  third  listing,  etc.  For  the  problems  listed, 
we  continue  multiplying  the  standard  initial  guess  by  powers  of  ten  until  the  intial  point  is  too 
large  for  us  to  even  be  able  to  compute  the  initial  residuals.  Convergence  was  never  a  problem. 

These  numerical  results  support  the  conclusions  that  for  poor  initial  guesses  and  nonzero 
residuals,  the  method  suggested  here  may  enjoy  some  advantages  over  the  Levenberg-Marquardt 
method.  Therefore,  further  investigation  seems  justified;  for  example,  we  do  not  allow  internal 
doubling,  and  we  have  made  use  of  the  freedom  in  the  parameter  ac  only  enough  to  show  that  it 
does  affect  performance. 

The  results  in  Table  7  show  that  (2.8c)  with  ac=min{  1,  ||  jjFe  ||  }  is  probably  the  best 


choice  of  secant  condition.  We  included  ac  =  1  because  of  the  relationship  to  ideas  of  Al-Baali 
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and  Fletcher  (1983)  mentioned  in  Section  2.  The  comparison  of  NONLSQ  to  NL2S1  is  much 
harder  due  to  the  differences  in  the  implementation  details,  and  so  we  will  just  present  the  NL2S1 
results  in  Table  2  for  completeness. 

We  hope  that  refinement  of  this  simple  and  computationally  convenient  idea  will  lead  to 
improved  algorithms  for  large  residual  nonlinear  least  squares. 


Table  2:  WL2S0L  Results 
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